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HE TERM “Interyiew” as usually de- 

fined means almost any face-to-face 
contact between two or more individuals 
which involves the exchange of informa- 
tion. When the purpose of the interview 
is to obtain information about the per- 
son being interviewed, it is usually called 
a diagnostic or appraisal interview. It is 
with this type of interview that this in- 
vestigation is concerned. 


As pointed out by Fearing (6, 7), the appraisal 
interview is “probably one of the oldest human 
social techniques,” and is also “the least studied, 
the most constantly used, and the most fre- 
quently challenged method of securing social 
data.” 

Studies of the reliability of judgments based 
on the interview (2, 7, 9, 10, 14,-15), although 
extremely divergent with respect to method of 
interview and the type of data selected for analy- 
sis demonstrate that these judgments will vary in 
reliability according to: (a) the person doing the 
interview; (b) the amount of objective data 
available to the interviewer before the interview; 
and (c)-the degree of structuring or standardiza- 
tion of the interview itself. 

The validity of judgments based on the inter- 
view technique has been less adequately studied 
than has the reliability of interview judgments. 
In 1931, Symonds (17) pointed out that inter- 
viewing “has not been subjected to experimental 
scrutiny or statistical validation.” A survey of 
the literature by the present writer did not indi- 
cate that the situation has been changed since 
1931. In none of the studies (1, 4, 5, 10, 13) was 
any estimate made of the contribution of the 
interview itself to the validity of the interviewer's 
judgments. In every case the validities given 
were for judgments based on the interview 
plus other material. As a group, the validity 
studies indicated that little evidence has been 
gathered to support the belief, prevalent among 
psychiatrists and psychologists as well as per- 
sons in general who deal with other people, that 
face-to-face contact in an interview situation is 
very necessary (and sometimes sufficient) for meas- 
uring personality traits and predicting behavior. 


The present investigation is an at- 
tempt to determine the validity of per- 


AN EVALUATION OF PERSONALITY-TRAIT RATINGS OBTAINED 
BY UNSTRUCTURED ASSESSMENT INTERVIEWS 


I. PROBLEM, PROCEDURES, AND METHODS 


sonality-trait ratings based on two types 
of unstructured interview situations and, 
further, to determine the increase in 
validity of interview ratings over that of 
ratings made by the same individuals 
without benefit of an interview. The 
data analyzed in this study were gathered 
during the 1947 Michigan Assessment 
Program.' If in some instances the data 
appear incomplete or the results incon- 
clusive it should be understood that the 
Assessment Program was not designed? 
to thoroughly investigate any particular 
technique but rather to determine the 
maximum validity of many techniques 
combined. 
A. SUBJECTS 

The sample investigated in this study 
consisted of 128 male college graduates 
who had been accepted by various uni- 
versities for training in clinical psychol- 
ogy leading to the Ph.D. degree under 
the Veterans Administration training 
program. All were beginning graduate 
students accepted at the P-1 level—the 
lowest professional grade in the U. S. 
Civil Service system. 

* The 1947 Michigan Assessment Program was 
conducted as a part of the Research Project on 
the Selection of Clinical Psychologists sponsored 
by the Veterans Administration under a contract 
with the University of Michigan. For a detailed 
report of the overall aims and research design 
of this project, the reader is referred to the 
project’s Preliminary Report issued in Decem- 
ber, 1948 (11). 

?The writer, although associated with the 
project since September, 1946, had no part in 
the planning or experimental design of the 1947 
Michigan Assessment Program and wishes to 
accept no credit therefor. Neither does he wish 
to be held responsible for certain justifiable 
criticisms of the design such as the lack of 
estimates of the reliability of interview ratings 


and the lack of independence between interview 
ratings and criterion ratings. 
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B. THe RATERS 


The subjects were studied intensively 
over a period of one week by a staff of 
thirty clinicians. Two of these were 
psychiatrists. The majority of the others 
were professional clinical psychologists, 
some being advanced graduate students 
in clinical psychology. Ratings made by 
these clinicians comprise the data of this 
investigation. 


C. THE RATING SCALE 


The rating scale used in the Assessment 
Program comprises 42 variables divided 
into a Scale A (a group of 22 bi-polar 
so-called surface traits—supposedly the 
more manifest dimensions of personal- 
ity), a Scale B (a group of g so-called 
source traits—supposedly the more under- 
lying dimensions of personality), and a 
Scale C (a group of 11 so-called criterion 
variables, which in reality are various 
skills believed necessary for successful 
functioning as a clinical psychologist). 
In addition, ratings were made on Lik- 
ing (Variable #0), defined simply as ex- 
tent of personal like or dislike for the 
subject. In rating students on Scales A 
and B the staff was instructed to rate the 
person as he was at the time of assess- 
ment. In rating on Scale C the student 
was to be rated as he would be five years 
in the future (i.e., when he would have 
one year of professional experience past 
the Ph.D. degree). Definitions of Scales 
B and C variables are given in the Ap- 
pendix. 

All three rating scales were the prod- 
uct of joint thinking on the part of the 
project planning committee and were 
the outgrowth of trial scales used in 
preliminary assessments the year before. 
Scale A was based in part on the findings 
of Cattell (3) in his factor analyses of 
personality ratings. 
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In using the rating scales, raters were 
instructed to use as a frame of reference 
“all first year clinical psychology gradu- 
ate students at universities accredited by 
the American Psychological Association 
to offer training in clinical psychology.” 
Raters were asked to make their ratings 
conform roughly to a normal distribution 
with suggested frequencies designated as 
3, 7, 15, 25, 25, 7, and 3 per cent, respec- 
tively, for each point from 1 to 8. 


D. THE INTERVIEWERS 


A staff team of three members studied 
each student intensively. This team con- 
sisted of an Initial Interviewer, an In- 
tensive Interviewer, and either a’ Test 
Integrator or a Projective Integrator. 
Each staff team studied four students 
from each class of 24 students. Sixteen 
staff member functioned as Initial and 
Intensive Interviewers interchangeably 
so that for each student team of four stu- 
dents, one staff member would act as 
Intensive Interviewer for two students 
and as Initial Interviewer for the other 
two students. The group of 128 students 
studied in this investigation thus received 
two sets of interview ratings—one set 
made after the Initial interview (herein- 
after designated as Init ratings) and one 
set made after the Intensive interview 
(hereinafter called Intens ratings). Since 
the 16 interviewers acted in both the 
Initial and Intensive interviewer capacity 
for the 128 students (although no staff 
member functioned as both Intensive 
and as Initial Interviewer for the same 
student), the problem of interviewer dif- 
ferences does not arise when comparisons 
are made between ratings based on the 
Intensive interview situation and ratings 
based on the Initial interview situa- 
tion. Each of the interviewers inter- 
viewed from two to twelve students in 


the role of Initial Interviewer and a 
similar number in the role of Intensive 
Interviewer. 

All 16 interviewers (2 psychiatrists and 
14 professional clinical psychologists) had 
had considerable interviewing experience 
before assessment. Without doubt they 
may be regarded as being more skilled 
at uncovering personality dynamics by 
use of the unstructured interview than 
the general population of interviewers. 
In fact all could undoubtedly be termed 
expert interviewers. 


E. Tue INITIAL INTERVIEW SITUATION 


The Initial Interview was an unstruc- 
tured type of interview lasting approxi- 
mately one hour. The material covered in 
this interview was purposely kept at a 
fairly superficial level so that little 
anxiety would be aroused in the subjects. 
The Initial Interviewer had available to 
him before the interview only the infor- 
mation about the subject usually found 
in a credentials file: i.e., application 
blank (Civil Service Form 57), letters of 
recommendation, and records of past 
performance in the form of college tran- 
scripts. The Initial interviewer is prob- 
ably comparable to the usual college in- 
terview for the purpose of determining 
admission elegibility, and is similar to 
the typical interview (although somewhat 
longer in duration) given applicants for 
employment in industry. 


F. THE INTENSIVE INTERVIEW SITUATION 


The Intensive interview was also un- 
structured. It lasted approximately two 
hours. The Intensive Interviewer had 
available before the interview the wealth 
of objective, projective, and autobio- 
graphical material described in detail be- 
low under Design of Assessment (Section 
I, G, 1). The Intensive interview was 
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deep and probing in nature: the inter- 
viewer attempted to uncover underlying 
personality dynamics in order to con- 
firm hypotheses suggested by his review 
of the data available on the student, or 
to fill gaps which persisted in spite of 
all the data available. The Intensive 
interview is probably comparable to the 
interview used for diagnostic purposes 
with a neuropsychiatric patient by a psy- 
chiatrist who has available to him the 
social history, letters from friends and 
relatives, and psychological test reports. 


G. THe DESIGN OF ASSESSMENT 


Lack of space prevents the presenting 
of the experimental design of the 1947 
Michigan Assessment Program in detail.* 
However, some information concerning 
the assessment is desirable so that the role 
of the interviews may be seen in proper 
perspective. Therefore, a typical student 
(“X’’) will be followed through the as- 
sessment program. In addition, the vari- 


Ous assessment ratings will be described 
in some detail. 


1. The students’ assessment schedule. Students 
came to the assessment center in groups of 20 
to 24 for periods of seven days. Soon after his 
arrival at assessment headquarters, “X” and his 
classmates were given an orientation by the 
Project Director, at which time the aims of 
assessment were reviewed and the students as- 
sured that personal material would be kept in 
strictest professional confidence. “X” was also 
told that since certain staff members would be 
rating him on the basis of certain specified 
materials he would have absolutely no contact 
with those staff members until after those par- 
ticular ratings had been made. The students 
were then assigned to teams of four. 

During his first several days of assessment, 
“X” took a battery of paper-and-pencil objective 
tests (Miller Analogies Test, Chicago Test of 
Primary Mental Abilities [Single Booklet Edi- 
tion], Allport-Vernon Study of Values, Strong 
Vocational Interest Blank, Minnesota Multi- 


*For a complete description of the assess- 


ment see the Preliminary Report (11) referred 
to earlier. 
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phasic Personality Inventory, and the Guilford- 
Martin Battery—Inventory of Factors STDCR, In- 
ventory of Factors GAMIN, and Personnel In- 
ventory I). Scores on these instruments were 
made available to the assessment staff. The asses- 
see was given a projective test battery consisting 
of the Rorschach, the Bender-Gestalt, a sentence 
completion test, and ten cards of the Thematic 
Apperception Test, He filled out a biographical 
inventory (consisting of 131 items), a psychologi- 
cal experience record (for aid in evaluating ob- 
jective and projective tests), and wrote from an 
outline a detailed autobiography (average length 
about 20 pages). He was interviewed by an Initial 
Interviewer, and later interviewed again by an 
Intensive Interviewer. 

On the fifth day of assessment, “X’ was put 
through a period of situation or role-playing 
tests designed to bring out salient features of the 
student’s interactions with others.’ He was ob- 
served and rated at this stage by his own staff 
team (the two interviewers and the Projective 
or Test Integrator), and was also observed and 
rated by another staff team who had never seen 
him before and who had no information avail- 
able about him. That evening, all students and 
staff relaxed with a social gathering. 

After breakfast on the last morning, “X” was 
asked to fill out a sociometric questionnaire 
(typical questions: “Who would you most like 
to go on a camping trip with?” and “Which 
student would you prefer to be supervised by in 


*In addition to these objective tests, the stu- 
dents took the Kuder Preference Record, the 
ACE General Culture Test, and the Strong 
Vocational Interest Blank twice more—once with 
instructions to fill it out as it would be answered 
by “women in general” and finally as it would 
be answered by “men in general.” Scores on 
these tests were not made available to the as- 
sessment staff but were filed away for later 
independent analyses. 

5The situation tests were patterned after 
the psychodramatic (role-playing) technique of 
Moreno. For a complete description and ‘thor- 
ough analysis of situation tests as used in 
the 1947 Michigan Assessment Program, the 
reader is referred to an article by Dr. William 
Soskin (17). One typical situation test involved 
two subjects, one of whom was informed in ad- 
vance that he was to play the role of a small- 
town high school principal who had called 
one of his male instructors in for a conference 
regarding the instructor’s rumored sexual mis- 
conduct. The other subject was informed that 
he was to play the role of a high school instruc- 
tor who had been told the principal wished 
to talk to him but was not informed what the 
conference was about. The ensuing five or ten 
minutes of role-playing consistently called forth 
aspects of the students’ personalities that sur- 
prised and enlightened even the participants. 
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your work?”) concerning his reactions to his 
classmates. In addition, he was asked to rate 
himself and his three teammates on Scales A, 
B, and C (using a slightly modified form of the 
staff rating form). Finally he was asked to 
prepare concise but frank character sketches 
about his teammates. 

When these tasks were completed, “X” had a 
final appointment with his intensive interviewer. 
Although he hoped to obtain from that inter- 
view the staff's opinion of him, he went away 
satisfied with a small amount of relatively in- 
nocuous data, such as test scores. 

Finally, “X” and his classmates met for a 
group-therapeutic session conducted by a visiting 
psychologist (who had taken no other part in 
the assessment program) and a farewell talk by 
the Project Director, who thanked the group 
for their cooperation and repeated the assur- 
ance that assessment findings would be kept 
in strictest confidence. 


2. Ratings made during assessment. 
The preceding section indicates the wide 
variety of psychological techniques used 
in assessment. Ideally, the program 
would have been designed so that each 
technique could be evaluated separately, 
with staff members rating independently 
each student on each technique. Un- 
fortunately, the expense of such a design 
(in terms of staff and student time as 
well as money) was prohibitive. As a 
compromise it was decided to make rat- 
ings on various combinations of psy- 
chological data and to design the assess- 
ment program so that even though esti- 
mates of the validity of ratings based on 
all techniques might not be obtained di- 
rectly, it would be possible to estimate 
the increase in validity of ratings ob- 
tained with the addition of different psy- 
chological data.* Wherever practical, in- 


‘Immediately after each set of ratings was 
made, the ratings were filed away in the project 
office and were not again made available to 
any staff member until the Final Pooling Con- 
ference. Similarly, notes made by any rater were 
not available to any other rater nor were the 
students discussed with other raters until . the 
Preliminary Conference. Thus, the Intensive 
Interviewer, for example, when making ratings 
after the Intensive interview had available only 
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dependent ratings on various separate 
psychological techniques were obtained 
as well. In general, the psychological 
data were utilized in order of increasing 
cost; credentials material costing only a 
postage stamp was presented to the raters 
first; objective tests costing only slightly 
more were presented second, etc. 

The following ratings were made dur- 
ing assessment: 

a. Ratings based on each of the four 
projective techniques made by the staff 
member administering the particular 
technique. 

b. Ratings based on the four projec- 
tive protocols plus interpretations of the 
protocols. These ratings were made by 
the Projective Integrators on half the 
students. 

c. Ratings made on the other half of 
the students by the Test Integrators. 
These ratings were based on all projec- 
tive material and in addition on all cre- 
dentials material, all autobiographical 
material and all objective test data. 

d. Ratings made by the Interviewers. 

(1) PreInit ratings. These ratings 
were made by the staff member serving as 
Initial Interviewer prior to the interview 
—i.e., before the raters had seen the sub- 
jects. These ratings were based entirely 
on the material contained in the Cre- 
dentials folders. 

(2) Init ratings. These ratings 
were made by the Initial Interviewers 
immediately after the Initial interviews. 
Thus, they were based on Credentials 
materials plus the information obtained 
during the Initial interview. 

(3) PreIntens ratings. These rat- 
ings were the fourth set of ratings made 


the written test material and the notes he 


himself had made during the interview. He had 
no knowledge of the ratings or the opinions of 
any other person. 
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by the Intensive Interviewers before see- 
ing the subjects. The Intensive Interview- 
ers had previously rated on Credentials 
material alone; then on Credentials plus 
Objective test data; then on Credentials 
plus Objectives plus Autobiographical 
material. The PreIntens ratings were 
based on Credentials material, Objec- 
tive test data, Autobiographical material, 
and Projective protocol material. 

(4) Intens ratings. These ratings 
were made by the Intensive Interviewers 
immediately after the Intensive inter- 
view. Thus, they were based on the data 
described under PreIntens ratings above 
plus the information obtained during 
the Initial jnterview. 

é. Ratings decided upon at the Pre- 
liminary Pooling Conference which oc- 
curred after the Intensive Interview. The 
team of three staff members (the two 
interviewers and the Projective or Test 
Integrator) assigned to the student partic- 
ipated in this conference. All the data 
referred to above were available for this 
conference as were the notes taken by 
any of the three staff members prior to 
the conference. None of the ratings made 
previously on the student either by the 
three staff members or by any other staff 
members were available for the confer- 
ence. 

f. Ratings based on the Situation tests. 
There were of two types: The three sets 
of individual Contaminated Situation 
ratings made by the student’s own staff 
team, and three sets made by another 
staff team who knew nothing about the 
student and thus were forced to base 
their ratings solely upon impressions 
gained from the Situation tests. In addi- 
tion, this latter staff team held a Situa- 
tion Pooling Conference at which they 
arrived at a set of Uncontaminated 
Pooled Situation ratings. 


g. The Final Individual ratings. On 
the last morning of assessment, each 
member of the student’s staff team re- 
viewed all material available on the stu- 
dent (including anything significant he 
might have gleaned during the social 
gathering following the Situation tests), 
and made a set of Final Individual rat- 
ings. 

h. Ratings by the students. On the 
last morning of assessment each student 
rated himself and his three teammates. 

i. The FinP ratings. During the last 
afternoon of assessment the student’s 
staff team met for a Final Pooling Con- 
ference at which was made available not 
only all psychological material gathered 
during assessment, but all ratings that 
had been made on or by the student. The 
FinP ratings arrived at during this con- 
ference thus represent the combined 
judgment of three clinicians who had in- 
tensively studied the subject for seven 
days and who had available a wealth 
of psychological data. 


H. AssEsSsMENT DATA USED IN THIS 
INVESTIGATION 

The PrelInit, Init, PreIntens,’ and 
Intens ratings described in the preceding 
section comprise the ratings whose validi- 
ties are estimated in this investigation. 
These ratings were validated against the 
FinP ratings. The estimated reliabilities 
of the FinP ratings are presented in 
Table 1 (in Section II, Results). Un- 
fortunately, the assessment design did 
not allow for any estimate of the relia- 
bilities of the predictor ratings. 

* The other three sets of ratings made by the 
Intensive Interviewers before the interview (see 
Section G,2,d,3 above) are being analyzed else- 
where. Their analyses are not included here 
because these ratings did not seem especially 
relevant to the main purposes of this study—the 
investigation of the validity of ratings based on 


interviews and increase in validity which can be 
attributed to the interviews. 
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The present paper is concerned only 
with the validity of ratings on Scales B 
and C. Scale A was not rated by either 
of the interviewers prior to the interview, 
and to present the validities of the in- 
terview ratings for Scale A would add 
nothing to the conclusions drawn from 
the investigation. 


I. THE CRITERION MEASURES 


Since criterion measures are of para- 
mount importance in any validity inves- 
tigation (and this is especially true in the 
area of personality measurement where 
good criteria are rare indeed), the FinP 
ratings should be examined in detail to 
determine whether they are acceptable 
criteria of the personality variables rated. 

The estimated coefficients of reliability 
of the FinP ratings (Column 5 of Table 
1) on the Scale B variables ranged from 
.73 to .go (median .87); those on Scale C 
from .88 to .g3 (median .go). These re- 
liabilities, while not as high as is desir- 
able for individual evaluations, are satis- 
factory for group measurements. Cer- 
tainly, they are of such magnitude as to 
permit evaluation of predictive devices; 
i.e., failure of any predictor to correlate 
highly with the FinP ratings can only 
be interpreted as lack of validity of the 
predictor. 

In addition, as noted above, the FinP 
ratings represent the combined and 
pooled judgments of three staff members, 
selected on the basis of professional com- 
petence in the field of clinical psychology 
or psychiatry, who had intensively 
studied each subject for a period of one 
week, making use of a wide variety of 
psychological techniques and materials. 

Considering these facts, there would 
seem to be little doubt that the FinP 
ratings of the personality variables mak- 
ing up Scale B are about as valid cri- 
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terion measures of these variables as are 
attainable from trained clincians with 
present techniques. 

The Scale C variables are a different 
matter. Scale C variables are not person- 
ality traits but rather predictions regard- 
ing future performance (five years later) 
in various aspects of the job of Clinical 
Psychology; Variable C42 was an ap- 
praisal of the overall suitability of the 
student for the field of clinical psychol- 
ogy. It is not believed, therefore, that 
Scale C FinP ratings are acceptable as 
criterion measures, even though they 
may represent “better guesses” than rat- 
ings made earlier in the assessment pro- 
gram, For this reason comparisons be- 
tween ratings on Scale C variables based 
on the interview situations and FinP 
ratings will be considered merely as 
agreements between two sets of ratings. 

One factor which must be considered, 
if the FinP ratings are to be used as cri- 
terion measures, is the factor of con- 
tamination of the FinP ratings. Of the 
three persons comprising the staff team 
who made the FinP ratings, two were the 
interviewers whose ratings are here being 
investigated. If the role of either In- 
tensive Interviewer or Initial Interviewer 
was regarded consistently by the other 
two team members as being a “better” 
role for judging personality characteris- 
tics, then that interviewer would have a 
dominance in the pooling conference 
which would tend to influence the other 
two team members. The effect of this 
influence would be to consistently bring 
the FinP ratings closer to the ratings 
made by the interviewer than would be 
warranted by the actual merit of those 
ratings. Dominance by individuals apart 
from their roles may be disregarded since 
all interviewers acted in the dual capac- 
ity of Intensive and Initial Interviewers 
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on the same staff team. Staff members 
were rotated from week to week so that 
for each student class the staff teams were 
made up of different combinations of 
staff members; this also tended to reduce 
the effect of dominance by any particular 
individuals. 

Detailed evidence on the effect of such 
role dominance on the estimates of 
validity of the interviewer's ratings will 
be presented elsewhere (18). The evi- 
dence, while not conclusive, does indicate 
that role dominance was present; that 
the role of Intensive Interviewer was be- 
lieved by the other staff members to be 
especially advantageous for judging per- 
sonality variables; but that the effect of 
such dominance on the FinP ratings was 
slight. The possibility that Intensive In- 
terviewer validity coefficients are spuri- 
ous will be considered when the results 
of this investigation are discussed in 
Section ITI. 


J. METHOD OF DETERMINING VALIDITY OF 
RATINGS BASED ON INTERVIEWS 


To determine the validity of ratings 
on the various traits made by the inter- 
viewers either before or after the inter- 
view, the rating given each subject by the 
interviewer on a particular variable was 
plotted against the criterion rating given 
that subject on the same variable. Pear- 
son product-moment correlation coeffici- 
ents were then computed from the scatter 
plots, using the conventional raw-score 
formula. The various sets of correlation 
coefficients obtained are given in Table 
1 appearing in the following chapter. 

To determine the contribution of the 
interview itself to the validity of the rat- 
ings made after the interview—that is, 
the contribution of the interview apart 
from the material available to the inter- 
viewer—correlation coefficients between 


8 ERNEST C. TUPES 


ratings made by the interviewer before 
the interview and the criterion ratings 
were squared and these squares subtracted 
from the squares of the coefficients of cor- 
relation between ratings made after the 
interview and the criterion ratings. The 
resulting figures may be thought of as the 
percentage of variance in the FinP rat- 
ings which can be accounted for on the 


basis of the interview. Since these figures 
are a function of how validly the vari- 
able was rated before the interview (vari- 
ables more validly rated before the inter- 
view being more limited in the amount 
the correlation coefficient could increase 
after the interview) an additional correla- 
tion was made which will be discussed in 
Section II. 


| 


HE MAIN findings of this investiga- 
Egret are presented in Table 1. In 
discussing the findings, those regarding 
the Initial interview will be considered 
first; those concerned with the Intensive 
interview second; and those concerned 
with agreement between Initial and In- 
tensive interviews last. 


A. THe INITIAL INTERVIEW 


The mean ratings' assigned by the Ini- 
tial Interviewers after the interview 
ranged from 4.32-to 5.61 (median mean 
rating 4.80). The criterion means ranged 
from 3.68 to 5.56 (median 4.42). The 
standard deviations of the /nit ratings 
range from 1.08 to 1.61 (median 1.41). 
The standard deviations of the criterion 
ratings ranged from 1.13 to 1.56 (median 
1.32). 

1. Validity of ratings made before the 
Initial interview. In Column 1 of Table 
1 are listed the correlations between the 
ratings, based on Credentials material 
only, made by the Initial Interviewers 
before they had seen the subjects (PreJnit 
ratings) and the FinP ratings arrived at 
by the staff team after their Final Pooling 
Conference. These correlations are small 
(only eleven are significantly different 
from zero at the .o1 level) and account 
for only a small amount of the criterion 
variance. As might be expected from the 
nature of the Credentials material (job 
histories, college transcripts, and the 
like), the variables most validly rated are 
those (B31, C36, and C39) con- 

*Only summary data concerning means and 
standard deviations of interview and criterion 
ratings are presented here. Complete tables of 
means and standard deviations will be presented 
in a forthcoming article (19). In the meantime, 


copies of these tables may be obtained from 
the author. 


Il. RESULTS 


cerned with future intellectual perform- 
ance, 

2. Validity of ratings made after the 
Initial interview. The validities of the 
ratings made by the Initial Interviewers 
after the Initial interview are given in 
Column 2 of Table 1, and the estimated 
validities of these ratings, if the criterion 
ratings were perfectly reliable, are listed 
in Column 6. 

It is evident that the Initial Inter- 
viewer after the interview was, on all 
variables, able to make ratings which 
correlated with the FinP ratings more 
closely than would have occurred had 
the Init ratings been made solely on a 
chance basis. All of the Init-FinP 1’s are 
larger than 23. , 

When the /nit-FinP correlations are 
corrected for unreliability in the FinP 
ratings,? the increase in correlations is 
negligible. The median r for Scale B is 
raised from .42 to .47, and that for Scale 
C from .46 to .49. The rank order correla- 
tion (rho) between the corrected and un- 
corrected r’s is .gg, indicating that lack 
of reliability in the FinP ratings had 
little effect on the reiative validity with 
which the different variables were rated 
by the Initial Interviewer. The rho’s be- 
tween the /nit-FinP correlations and the 


reliabilities of the FinP ratings are .17 


* The FinP reliabilities are given in Column 
5 of Table 1. These reliabilities were estimated 
in this manner: 

The reliabilities of the Final Individual ratings 
(made just before the Final Pooling Conferences) 
were first estimated by computing ra», Tac, and 
lp-. Where a, b, and c are the sets of Final Individ- 
ual ratings made by the three team members, 
respectively. The median correlation for each 
variable was selected as the best indication of the 
reliability of a single rater. The coefficients of 
reliability for the FinP ratings were estimated by 
the Spearman-Brown prophecy formula applied 
to these median correlations. 
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for Scale B and .28 for Scale C. Appar- 
ently differences in reliability of the FinP 
ratings are only slightly related to differ- 
ences in correlations between [nit rat- 
ings and FinP ratings. 

In order to test the significances of 
differences between the Jnit-FinP 1’s for 
all 20 variables, the r’s were first con- 
verted into Fisher’s z-functions by use of 
a table given by Lindquist (12, p. 212), 
and the significance of the difference be- 
tween each pair of z’s was estimated. All 
possible combinations of the 20 variables 
taken two at a time amount to 189 pairs 
of differences. Thirty-eight of these dif- 
ferences are significant (P = .o5 or less). 
By chance one would expect to find only 
five differences out of each hundred (or 
nine out of the 189) reaching the .o5, level 
of significance. The conclusion seems 
justified that differences do exist in the 
validities with which the different vari- 
ables were rated by the Init interviewer.* 

3. The contribution of the interview 
to the validity of ratings made after the 
Initial interview. Comparison of the Pre- 
Init-FinP correlations (based on ratings 
before the interview) with the Init-FinP 
correlations (based on ratings after the 
interview) provides an opportunity to 
estimate the incremental contribution 
of the Initial interview to the validity of 
the post-interview ratings. If the PreInit- 
FinP r’s (Column 1 of Table 1) and the 
Init-FinP r’s (Column 2 of Table 1) are 
squared, the differences between these 


* This conclusion is rendered even more ten- 
able by the fact that some correlation might be 
expected to be present between the z's (since 
the r’s on which the z’s are based were all 
computed on the same sample of cases and hence 
are not independent), but the amount of this 
correlation is unknown and thus was ignored in 
estimating the standard errors of the differences. 
The effect of ignoring such correlation is to 
overestimate the standard errors of the differ- 
ences and thus to underestimate the significance 
of the differences. 
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r’s provide estimates of the amount of 
FinP rating variance accounted for by 
the Jnit ratings which can be attributed 
to the Initial interview. However, the 
FinP ratings differed in reliability and it 
would seem that these “gain in variance 
accounted for” figures might be more 
representative of the contribution of the 
interview were they corrected in some 
manner for FinP reliability differences. 
Therefore, an Init Net Gain figure was 
computed by subtracting the PreInit- 
FinP r? to obtain a “possible gain in vari- 
ance accounted for” figure by which the 
“gain in variance accounted for” figure 
was divided. It is these relative Init Net 
Gains which are listed in Column 8 of 
Table 1.4 

For illustrative purposes, an example 
of the computation of the Init Net Gain 
figure for Variable Beg follows: From 
Table 1, it may be seen that for Vari- 
able Beg, the PreInit-FinP r is .og and 
the Init-FinP r is .38. The difference be- 
tween the squares of these r’s is .1440, 
which can be interpreted as the amount 
of the variance of the FinP ratings on 
Beg accounted for by the Jnit ratings 
which can be attributed to the Initial 
interview. The reliability of the FinP 
ratings on Beg is .go. Subtracting from 

‘Partial correlation coefficients could have 
been used to determine the net contribution of 
the interview to the validity of the post-inter- 
view ratings, but when both pre-interview and 
post-interview validity coefficients are large, a 
greater difference between the first order 71's is 
necessary to obtain the partial r which would 
be obtained by a smaller difference between 
first order r’s when their magnitude is smaller. 
Partial correlation coefficients thus would cause 
an underestimation of the relative contribution 
of the interview when the pre-interview ratings 
correlated higher with the criterion ratings. The 
net gains are not influenced by the magnitudes 
of the r’s, and in addition, by taking account of 
the reliability of the FinP ratings, do not penal- 
ize the interviewer’s ratings on those variables 


where, because of lower FinP reliabilities, the 


maximum JIntens-FinP correlation possible is 
actually less. 
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this the square of the PreInit-FinP r 
(.o004), the “possible gain in variance ac- 
counted for” figure .8996 is obtained. Di- 
viding .1440 by .8996 results in the Init 
Net Gain figure .16, which is given in 
Column 8 of Table 1. 

Examination of Column 8 indicates 
that there are appreciable differences in 
the Init Net Gains between the different 
variables, but interpretation of these dif- 
ferences is difficult. The two variables 
with highest net gains are B25 and B26, 
which were among the five variables 
(18) on which the Final Individual rat- 
ings of the Initial Interviewer had the 
highest (of the three team members) cor- 
relations with the FinP ratings, so role 
dominance may be a contributing factor. 
A common core underlying the variables 
(B24, B28, Beg, C33, and C37) which 
gained relatively least from the Initial 
interview becomes apparent when it is 
considered that these variables all have 
fairly high loadings on the same factor®°— 
one which was tentatively defined as so- 
cial intelligence, or the ability to effec- 
tively use intelligence in interpersonal 
relations. Apparently the credentials 
folder furnished some clues to general 
intelligence (college transcripts, work 
history, etc.) which allowed the raters to 
rate these variables with some success 
before the interview. The interviewer 
was able to obtain little information dur- 
ing the Initial interview which added to 
the estimate of intelligence already made, 
and little other information which was 
of help in estimating how effectively the 
subjects were able to use their intelli- 
gence in social situations, 


* This factor was obtained in an unpublished 
factor analysis of the intercorrelations between 
FinP ratings of Scales B and C carried out by 
the Research Project on the Selection of Clinical 
Psychologists. 
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B. THE INTENSIVE INTERVIEW 


The mean ratings assigned by the In- 
tensive Interviewers after the interview 
ranged from 3.89 to 5.54 (median mean 
rating 4.65). The standard deviations of 
the Intens ratings varied from 1.14 to 
1.63 (median standard deviation 1.38). 

1. Validity of ratings made before the 
Intensive interview. The correlations be- 
tween the PreIntens ratings (the ratings, 
made by the Intensive Interviewers just 
before the interview, based on Cre- 
dentials, Objective test data, Autobio- 
graphical material, and Projective pro- 
tocols), and the FinP ratings appear in 
Column g of Table 1. With one exception 
(Variable Bas) the correlations are sig- 
nificant at or beyond the .o1 level of sig- 
nificance. As was found true for the Pre- 
Init ratings, the variables most validly 
rated by the Intensive Interviewer before 
the interview are B31, C32, C36, and Cgg 
—all variables which are concerned with 
intellectual performance. 

2. Validity of ratings made after the 
Intensive interview. In Column 4 of 
Table 1 are listed the validities of rat- 
ings made by the Intensive Interviewer 
after the Intensive interview. In Column 
7 appear the validities which would have 
resulted had the FinP ratings been per- 
fectly reliable. 

All of the Intens-FinP correlations at- 
tain significance beyond the .o1 level. 

It should be noted that correcting the 
Intens-FinP r’s for unreliability of the 
FinP ratings raised the median r from 
.61 to .67 for Scale B, and from .68 to .71 
for Scale C, Even had the FinP ratings 
been perfectly reliable the Jntens ratings 
would have done only a slightly better 
job of predicting the FinP ratings. The 
rank difference correlation coefficient, 
rho, computed between the rank orders 
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of the uncorrected r’s and the 1’s cor- 
rected for attenuation is equal to .g7, in- 
dicating that the unreliability of the 
FinP ratings had little effect on the rank 
order of the validities with which differ- 
ent variables were rated after the Inten- 
sive interview. However, differences in 
the reliability with which the variables 
were rated by the staff team during the 
Final Pooling Conference may be related 
to differences in the validity with which 
the variables were rated by the Inten- 
sive Interviewer. The rho’s between the 
rank orders of the FinP reliability coeffi- 
cients and the rank orders of the Jntens- 
FinP correlations are equal to .69 for 
Scale B and .1g for Scale C, suggesting 
that for Scale B, trait differences in In- 
tensive Interview validities would be 
smaller were the FinP ratings of equal 
reliability. 

The Intens-FinP were converted 
into z’s and tested for significance of dif- 
ferences between them in the manner de- 
scribed for the Init-FinP r’s. Thirty-eight 
pairs of differences of the total of 189 
pairs were found to be significant at the 
.05 level, indicating that differences do 
exist in the correlations between the 
Intens and FinP ratings. 

3. The contribution of the interview 
to the validity of ratings made after the 
Intensive interview. The relative Net 
Gain figures for the Intensive interview 
(obtained by the method described for 
the Initial interview—Section II, A, g 
above) are shown in Column g of Table 
1. The variables (B25, Bgo0, C35, C38, 
C40, and C42) to which the Intensive 
interview makes the greatest relative con- 
tribution seem to be those concerned 
with social relations with others in the 
professional work situation. The vari- 
ables to which the Intensive interview 


13 


contributes very little (B23, B27, Beg, 
C32, and C36) appear to be variables con- 
cerned with effective intelligence or in- 
tellectual efficiency. Beg (Social Adjust- 
ment) does not fall into this class, but 
this variable is relatively broad in defini- 
tion and has many aspects which render 
it poorly ratable on the basis of a face- 
to-face conversation. Also, even rela- 
tively poorly adjusted individuals might 
be able to put up a facade for the inter- 
view period. 

Role dominance may be a contribut- 
ing factor to the apparent contribution 
of the Intensive interview. That is, those 
variables which show relatively high Net 
Gains may do so, not because of any 
contribution of the interview itself, but 
because the other two team members, be- 
lieving the Intensive Interviewer to be 
the best informed on these variables, de- 
ferred to his judgments when determin- 
ing the FinP ratings. In fact, five of the 
six variables with the highest relative 
Intens Net Gains were variables for 
which the Final Individual ratings (just 
before the Final Pooling Conference) 
made by the Intensive Interviewer cor- 
related higher with the FinP ratings than 
did the Final Individual ratings made 
by either of the other two staff members 
(18). Of the five variables with the lowest 
relative Intens Net Gains only two were 
variables for which the Final Individual 
ratings of the Intensive Interviewer had 
higher r’s with the FinP ratings. 


C. COMPARISON OF THE INTENSIVE AND 
INITIAL INTERVIEWS 


1. Agreement between ratings made 
after the interviews. In Column 10 of 
Table 1 are listed the correlations be- 
tween the /nit ratings and the Intens rat- 
ings. Thirteen of the 20 7’s were signifi- 
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cant at the .o1 level. Two facts seem of 


especial interest here. First, the median r 
for Scale B is .19, while that for Scale C is 
.g1. Apparently, the two interviewers 
agreed better with each other on ratings 
of performance five years in the future 
than they did on ratings of present per- 
sonality traits. Second, for no variable 
did the ratings made after the interviews 
correlate with each other to a greater 
extent than either set of ratings cor- 
related with the FinP ratings. In addi- 
tion, it may be noted that the variables 
(B31, C32, and C36) on which the ratings 
made after the interviews showed the 
greatest agreement are among those 
which were most validly rated before the 
interview by both the Initial and the In- 
tensive Interviewers and which gained 
least in terms of relative net gains from 
either of the interviews. That is, those 
variables to the rating of which the inter- 
views contributed relatively little are the 
very ones on which there is most agree- 
ment after the interviews. 

2. Comparison of the Interviewers’ rat- 
ings with respect to correlation with the 
FinP ratings. In Columns 1 through 4 
of Table 4 appear four sets of correla- 
tions between Interviewers’ ratings and 
FinP ratings: 

The PreInit-FinP r’s (Column 1) are 
the correlations between the FinP ratings 
and ratings (based on Credentials ma- 
terial only) made by the Initial Inter- 
viewer before the interview. 

The Init-FinP 1’s (Column 2) are cor- 
relations between the FinP ratings and 
ratings by the Initial Interviewer after 
the one-hour Initial interview. 

The PreIntens-FinP 1's (Column 3) 
are correlations between the FinP ratings 
and ratings by the Intensive Interviewer 
just prior to the interview. In making 
these ratings, the Intensive Interviewer 
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had available Credentials material, Ob- 
jective test data, Autobiographical ma- 
terial, and Projective protocols. 

The Intens-FinP 1’s (Column 4) are 
correlations between the FinP ratings 
and ratings by the Intensive Interviewer 
after a two-hour Intensive interview. 

These four sets of correlations permit 
of six comparisons, each of which yields 
some interesting information. 

a. The comparative validities of pre- 
interview ratings based on different 
amounts of written material. When the 
PreInit-FinP correlations are compared 
with the PreIntens-FinP, it is evident 
that the latter are appreciably higher for 
all variables. When it is considered that 
the PreInit ratings were based on Cre- 
dentials material while the PreIntens rat- 
ings were based on Credentials material 
plus considerable other psychological 
data, it seems indicated that the validity 
of personality-trait ratings will vary with 
the amount of psychological data made 
available to the raters. 

b. The comparative validities of rat- 
ings made after the Initial and Intensive 
interview situations. The Intens-FinP 
correlations were consistently higher 
than the Init-FinP correlations. Whether 
the relatively greater validity of ratings 
after the Intensive interview situation 
can be attributed to the greater amount 
of psychological data available to the In- 
tensive Interviewer, or to the greater 
length and depth of the Intensive inter- 
view, or to both factors, is a matter of 
speculation. The evidence presented be- 
low (Section II, C, 3) when the inter- 
views are compared with respect to rela- 
tive net gains suggests that most, if not 
all, of the apparent superiority of the 
Intensive interview situation is a result 
of the greater amount of written data 
available before the interview. 
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c. The comparative validities of rat- 
ings made before and after the inter- 
views. Four comparisons may be made 
here. Two of these, the comparison of the 
validities of ratings before and after the 
Initial interview and before and after 
the Intensive interview have been made 
indirectly in the sections (II, A, 3, and 
II, B, 3, above) wherein the material con- 
cerning the net gains in validity at- 
tributable to the interview was pre- 
sented, so it need be only stated here 
that for all variables, ratings made after 
each type of interview correlated higher 
with the FinP ratings than did ratings 
made before that interview. The third 
comparison, that between PreInit-FinP 
correlations and Intens-FinP correlations, 
merits little consideration since it merely 
substantiates the rather obvious hypoth- 
esis that ratings based on a thorough 
interview plus a comprehensive mass of 
written psychological data are more valid 
than ratings based on only a little written 
data without benefit of an interview. 

The last comparison, that between 
Init-FinP correlations and PreIntens 
FinP, seems of such importance that it is 
worthy of consideration separately. 

d. Comparison of ratings made after 
the Initial interview with ratings made 
before the Intensive interview. Propo- 
nents of the appraisal interview frequent- 
ly state that the value of interviews lies in 
their flexibility. The argument goes that 
in the interview situation it is possible 
to follow up leads; to confirm or reject 
hypotheses regarding the personality 
dynamics of the interviewee; and thus to 
form much more accurate judgments 
than are possible on the basis of test data 
alone. Test scores, on the other hand, 
are rigid and inflexible. Even projective 
devices offer only hints to possible dy- 
namics. Thus, test material may have 
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some value in assessing the more super- 
ficial surface traits, but only by talking 
to the subject, by asking him questions, 
and by listening to his answers, may the 
underlying, more fundamental, source 
traits be accurately appraised. 

Comparison of the Init-FinP correla- 
tions (Column 2 of Table 1) with the 
PreIntens-FinP correlations provides at 
least a partial test of this belief. The 
Init ratings were based on a one-hour 
interview (plus the background material 
available in the Credentials file). The 
PreIntens ratings were based entirely on 
written data (with no opportunity to 
talk to, or even see, the subject). The 
Scale B traits are source traits and hence 
supposedly more validly rated on the 
basis of an interview. 

This belief is not upheld by the present 
study. The median /nit-FinP correlation 
for Scale B traits is .42; the median 
PreIntens-FinP correlation is .49. The 
Init ratings are more valid for two of 
the Scale B traits; the PreIntens ratings 
are more valid for seven. 

Approximately the same relationship 
existed for the Scale C variables. The 
median Jnit-FinP r is .46; the median 
PreIntens-FinP r is .57. The PreIntens 
ratings are more accurate for nine of 
the eleven variables. 

Thus, ratings based on only test and 
biographical data are more valid than 
ratings based on an interview plus the 
information contained in a credentials 
folder. 

3. Comparison of the interviews with 
respect to relative net gains. When the 
relative net gains in correlations between 
ratings which can be attributed to the 
interviews (Columns 8 and 9 of Table 1) 
are compared, the Intensive Interviewer 
seems to gain slightly more from the 
interview than does the Initial Inter- 
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viewer, although the difference is neither 
statistically nor practically significant. 
The median net gain is .24 for the In- 
tensive Interviewer and .16 for the Ini- 
tial Interviewer. The fact that the Inten- 
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sive Interviewer appears to have been 
considered by the other two staff mem- 
bers as being in a somewhat better posi- 
tion to rate many of the variables renders 
even this small difference questionable. 
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that interviews apparently do con- 
tribute to the validity with which person- 
ality variables are rated (i.e., the assess- 
ment raters typically made more valid 
ratings on the basis of an interview plus 
psychological data than they did on the 
basis of psychological data alone). That 
contribution is slight, however, and may 
be even less than has been demonstrated 
here. It should be remembered that two 
of the three criterion team members 
were the interviewers whose ratings 
were studied. Were the criterion meas- 
ures truly independent—had none of the 
criterion team members been _inter- 
viewers—it does not seem likely that the 
validity of the interviewers’ ratings 
would have been as high as was found in 
this study. On the other hand, it is pos- 
sible that had the interviews occurred 
first—with psychological data presented 
to the interviewer after he had already 
rated on the basis of the interview—their 
validity coefficients might have been 
higher, with the psychological test data 
yielding only a slight additional validity 
to the interview ratings already made. 
That is, the psychological data when pre- 
sented first as in this investigation may 
have given the interviewers a set so that 
they were unable to change their ratings 
after the interview. It seems unlikely 
that any such persistence of initial im- 
pressions actually did decrease the effec- 
tive validity of the ratings after the in- 
terviews for these reasons: (1) The Ini- 
tial Interviewers, who had a minimum of 
psychological information (only the cre- 
dentials files) available before the inter- 
view, did not gain more from the inter- 
view than did the Intensive Interviewers 


Ill. DISCUSSION, CONCLUSIONS, AND SUGGESTIONS 
FOR FURTHER RESEARCH 


hee PRESENT investigation indicates 


who had much psychological data avail- 
able before the interview and thus had 
opportunity to form stronger initial im- 
pressions. (2) Ratings based on unstruc- 
tured appraisal interviews, unless used 
in conjunction with some sort of objec- 
tive psychological data, have been shown 
to be unreliable, and hence invalid. Addi- 
tional research is necessary for a definite 
answer to this question. It is unfortunate 
that the assessment program could not 
have been designed to answer this and 
the similar questions arising when the 
validity of any of the many assessment 
techniques is investigated, but, of course, 
all possible permutations of the various 
techniques would require several hun- 
dred summers of assessments, Since but 
one assessment was possible it was actu- 
ally designed on the basis of increasing 
cost of technique. Thus, credentials cost- 
ing only postage were made available to 
the staff first, objective tests second, the’ 
interviews (fairly expensive since they 
required both students and interviewers 
to be present) later, and last the situation 
tests requiring groups of students and 
groups of staff members. 

When the two types of interview situa- 
tions are compared with respect to valid- 
ity of ratings made by the interviewers 
after the interviews, it is apparent that 
the Intensive Interviewer was typically 
able to make ratings which were more 
valid than were ratings made by the Ini- 
tial Interviewer. When the two types of 
interview situations are compared with 
respect to relative net gains this relation- 
ship still exists, but the difference be- 
comes statistically insignificant. In addi- 
tion, when it is considered that the In- 
tensive Interviewer may have been con- 


sidered by the other two team members 
to be the person in the best position to 
rate, a large portion of this difference in 
validity would seem to disappear. 

There appears, then, to be little differ- 
ence in the true relative contributions of 
the two types of interview situations, The 
Intensive interview, which was two hours 
in length and in which an attempt was 
made to probe for underlying personality 
dynamics, contributed little, if any, more 
than did the Initial interview, which 
lasted only one hour and which was os- 
tensibly devoted to the gathering of 
rather superficial information. 

Since differences in the interviews 
themselves cannot account for the differ- 
ences in validity of ratings after the two 
interview situations, these differences 
would seem to be functions of the differ- 
ences in the amount of psychological 
test data available to the interviewers for 
study before the interview. This is 
further evidenced by the fact that the 
PreIntens validities were higher than the 
PreInit validities. 

Of equal significance is the fact that 
ratings based on the one-hour interview 
plus credentials (the Init ratings) cor- 
related less well with the criterion ratings 
than did the PreIntens ratings—ratings 
which were based on a varied and com- 
prehensive set of psychological data but 
without any interview. Apparently, 
skilled clinicians can use fairly complete 
psychological test data in the rating of 
personality traits slightly more effectively 
then they can use an interview, even 
when the interview is aided by historical 
material in the form of credentials. Un- 
fortunately, neither set of ratings has 
much effective validity. The median va- 
lidity of the Init ratings for all twenty 
variables is .45 and the median validity 
of the PreIntens ratings is .49. Such cor- 
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relation coefficients account for less than 


25% of the variance in the criterion rat- 
ings, and indicate the predictor ratings to 
be only about 25% more valid than 
chance ratings—this despite the fact that 
the criterion ratings are based on judg- 
ments of the same persons who inter- 
preted the psychological data and con- 
ducted the interviews. If the Initial inter- 
view situation is in fact (as suggested in 
Section I, E) comparable to the typical 
personnel selection or college admission 
interview, it would appear these inter- 
views have little actual value. 

Combination of comprehensive psy- 
chological data and a two-hour probing 
interview results in Intensive interview 
ratings whose median correlation with 
the criterion ratings is about .63, account- 
ing for about 40% of the criterion vari- 
ance. This is a sizeable increase over the 
20% accounted for by ratings based on 
psychological data alone, but still leaves 
the greater percentage of the criterion 
rating variance unaccounted for—a per- 
centage which would probably be still 
greater were there not a personnel over- 
lap between the persons making the pre- 
dictor and the criterion ratings. 

Validity coefficients of .65 are con- 
sidered fairly satisfactory for group tests 
used for selection purposes, but for indi- 
vidual personality assessment somewhat 
higher validities are needed. Especially 
is this true in the areas of clinical psy- 
chology and psychiatry where decisions 
of basic importance to clients and pa- 
tients must be made—and frequently are 
made on the basis of much less complete 
psychological data than were available in 
the present instance, 


A. CONCLUSIONS 


The conclusions to be drawn from this 
investigation have been implicit through- 
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out this and the preceding chapter. Ex- 
plicitly stated they are: 

1. The contribution of the unstruc- 
tured appraisal interview to the validity 
of personality-trait ratings based on an 
interview plus psychological data is 
slight. 

2. Longer interviews with the objec- 
tive of uncovering personality dynamics 
contribute little, if any, more to the 
validity of personality-trait ratings than 
do shorter interviews whose main objec- 
tive is the eliciting of information. 

3. The more comprehensive the psy- 
chological data available, the more valid 
will be personality-trait ratings based on 
that data. 

4. Personality-trait ratings based on 
fairly comprehensive psychological test 
data are more valid than are ratings 
based on interviews with only a little 
psychological data in the form of creden- 
tials material available. 

5. At the present time, even skilled 
clinicians, having available for study and 
integration a wide variety of objective 
test data, projective protocols, creden- 
tial material, and autobiographical data, 
plus data accruing from a probing face- 
to-face interview, do not appear able to 
make personality assessments, in the form 
of ratings of personality traits, which 
have sufficient validity to be as useful in 
individual case work as is desirable. 


B. SUGGESTIONS FOR FURTHER RESEARCH 


This investigation, although it repre- 
sents a definite contribution to the study 


of the validity of personality-trait ratings 
based on interviews, in that it provides 
an estimate of the increment in validity 
which may be attributed to the inter- 
view itself, has left at least three impor- 
tant questions unanswered: 

1. The previously mentioned problem 
of whether the presentation of psycholog- 
ical data before the interview formed a 
“set” in the interviewer which prevented 
his post-interview ratings being changed 
sufficiently to reflect the true validity of 
the interview. 

2. The interview as a selection instru- 
ment could not be evaluated here since 
the Final Pooled ratings of Scale C vari- 
ables did not appear to be acceptable as 
selection criteria. A similar study is neces- 
sary to determine the incremental valid- 
ity of interview ratings of job-success 
variables when those ratings are evalu- 
ated against acceptable criteria of job- 
success. 

3. Although this investigation was 
able in some measure to estimate the in- 
cremental validity of the interview when 
used to assess personality variables, it is 
probable that such incremental validity 
has been overestimated inasmuch as the 
criterion measures themselves were deter- 
mined by the persons who originally 
made the interview ratings. It would 
seem desirable, then, that another study 
be carried out so that the incremental 
validity of the interview in the assessment 
of personality-traits might be evaluated 
against truly independent criteria. 
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IV. SUMMARY 


A. Osjectives oF STUDY 


The main objectives of this study were: 

1. To determine the relative validity 
of ratings of personality traits based on 
two types of interview situations, differ- 
ing in amounts and kinds of material 
available to the interviewer and in the 
length of the interview. 

2. To determine the incremental val- 
idity of each of the two types of inter- 
view situations, i.e., how much more 
valid were interview ratings than other 
ratings made without benefit of the in- 
terviews? 

Other related problems were also in- 
vestigated. 


B. METHODS AND PROCEDURES 


A total of 128 male first-year gradu- 
ate students majoring in clinical psy- 
chology at some 30 universities were in- 
terviewed in two types of interview situa- 
tions; one, an Initial Interview lasting 
one hour, with only credentials material 
available before the interview for study 
by the interviewer; and two, an Intensive 
Interview lasting two hours, with very 
comprehensive psychological test data 
available for study by the interviewer 
before the interview. The interviews were 
conducted by 16 clinicians (2 psychia- 
trists and 14 psychologists) who had had 
considerable prior interviewing experi- 
ence, each interviewer acting interchange- 
ably as Initial Interviewer and as Inten- 
sive Interviewer. 

The subjects were rated by the inter- 
viewers before each interview on nine 
personality-trait variables and eleven 
“future performance” variables, and after 
the interview on 31 personality-trait vari- 
ables and eleven “future performance” 
variables. Four sets of ratings were thus 


available for each subject: the Pre-Initial 
interview ratings, based on credentials 
material alone; the Initial interview rat- 
ings, based on credentials material plus 
an Initial interview; the Pre-Intensive in- 
terview ratings, based on comprehensive 
written psychological data; and the In- 
tensive interview ratings, based on com- 
prehensive psychological data plus an In- 
tensive interview. 

The validity of each of the four sets of 
ratings (for each variable) was estimated 
by correlating each set of ratings with 
criterion ratings arrived at by a team of 
three psychologists who had intensively 
studied each subject for a period of one 
week. The incremental validity of each 
type of interview was estimated by com- 
paring the validity of ratings made just 
before and just after the interview, and, 
in addition, by comparing ratings based 
on the Initial interview situation with 
the Pre-Intensive ratings based on com- 
prehensive written psychological data but 
without an interview. These estimates of 
validity and incremental validity are 
probably spuriously large, due to person- 
nel overlap between interviewers and 
criterion teams. 


C. FInpDINGs 


In summary it was found that: 

1. Ratings made after both types of 
interview correlated significantly with 
the criterion ratings. 

2. Significant differences existed .in the 
relative validity with which the different 
variables were rated. 

g. Criterion unreliability decreased 
only slightly the validity of interview rat- 
ings. 

4. Ratings made after the Intensive 
interview were more valid than ratings 
made after the Initial interview. 
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5. Ratings made after each type of 
interview correlated significantly with 
ratings made after the other type of 
interview, but these correlations were 
lower than correlations between each set 
of ratings and the criterion ratings. 

6. Ratings after each type of interview 
correlated higher with the criterion 
measures than did ratings made before 
the interview, but there appeared to be 
no differences between the two inter- 
view situations (when role dominance 
is considered) in terms of gain in valid- 
ity of ratings after the interview over 
ratings before the interview. 

7. Ratings made by the Intensive In- 
terviewer before the interview (based on 
written material alone) were slightly 
more valid than ratings made by the 
Initial Interviewer after the interview. 


D. CONCLUSIONS 


In summary it was concluded that: 
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1. The incremental validity of the un- 
structured interview when used in the 
assessment of personality is slight, and 
is even slightly negative when ratings 
based on an interview plus a minimal 
amount of psychological data are com- 
pared with ratings based on comprehen- 
sive written data but without an inter- 
view. 

2. Longer, probing interviews have 
little more incremental validity than 
short “information-eliciting” interviews. 

3. The more comprehensive the psy- 

chological data available, the more valid 
will be personality-trait ratings based on 
such data. 
_ 4. At the present time even skilled 
clinicians, basing their judgments on 
comprehensive psychological data plus 
an interview, do not appear able to make 
personality-trait ratings with sufficient 
validity to be as useful in individual case 
work as is desirable. 
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APPENDIX 
RATING SCALE DEFINITIONS 


SCALE B (Variables Nos. 23-31) 

Note: For ratings on this scale, 1 = left side or low, 8 = right side or high. 

Since many of the following attributes (#23-30) are broad factors, it is unlikely that any person 
will fit ali the phrases grouped together at one pole of a given variable. Note also that for some 
items neither extreme necessarily represents a desirable attribute. 

23. Social Adjustment: How well does he adjust to varied interpersonal situations? (Includes sexual 
adjustment only as it affects social adjustment in general.) 


Acts without consideration for feelings of others; Actively considers feelings of others; readily 
often rejected by others, often appears aloof, gains acceptance in interpersonal relationships; 
hostile, or irritable. maintains a friendly and likeable manner. 


24. Appropriateness of Emotional Expression: How appropriate are his emotional responses to the 
situation? 


Fails to adapt his emotional responses to the Shows emotional responses of a quality and 
needs of the situation; shows disorganized or intensity befitting the situation; reacts spontane- 
overly constricted emotional responses. ously but appropriately; shows well-integrated 
and flexible patterns of emotional behavior. 


25. Characteristic Intensity of Inner Emotional Tension: How intense is his inner emotional life 
as inferred from all available clues? 


Inner emotional life characterized by a minimum _ Has strongly repressed emotional drives result- 
of persistent internal tensions. ing in inner turmoil; great inner conflict and 
strong pent-up emotions, 


26. Sexual Adjustment: To what degree do his sexual needs and activities affect his overall adjust- 
ment? 


His sexual needs and activities seriously inter- His sexual needs and activities definitely enhance 
fere with his overall adjustment. his overall adjustment. 


27. Motivation for Professional Status: How strong is his drive for the status-rewards of a pro- 
fessional career? 

28. Motivation for Scientific Understanding of People: How strong are his drives toward acquiring 
the facts, theories, and skills necessary for the scientific understanding of individual human beings? 

29. Insight into Others: How much insight does he have into the attitudes, emotions, and motiva- 
tions of others? 


Interprets behavior at its face value; insensitive Has good awareness of underlying dynamics of 
to any but gross differences in behavior; does behavior; is sensitive to subtle nuances of behavi- 
not develop any integrated understanding of oral responses; is able to develop integrated un- 
behavior or of people. derstanding of the behavior of people. 


go. Insight into Himself: How much insight does he have into the underlying dynamics of his 
own attitudes, emotions and motivations? 

31. Quality of Intellectual Accomplishments: What is the characteristic quality of his intellectual 
output? 


Intellectual work is characteristically of low Characteristically produces intellectual work of 
quality. high quality. 


SCALE C—CRITERION SKILLs (Variables Nos. 32-42) 

Note: For ratings on this scale, 1 = low, 8 = high. 

Ratings on No. 32 refer to performance in graduate school; ratings on Nos. 33-42 refer to student’s 
performance five years hence (i.e., after one year of experience past the Ph.D.). 

What will be his level of competence or skill in the varied aspects of: 

32. Academic Performance (during next three or four years): How well will he: 

Effectively master course content, successfully complete courses in general psychology, clinical 
psychology, statistics, and related fields; satisfy language requirements for the doctorate; pass general 
examinations. 
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33- Clinical Diagnosis: How well will he: 

Recognize dynamics underlying particular responses in both objective and projective tests, observe 
significant interrelationships among responses, relate findings to case history and other clinical data. 

Elicit from the patient information required for mental status examinations and case histories; 
ascertain and evaluate attitudes and incidents of psychological significance in the patient. Synthesize 
clinical findings to arrive at an integrated picture of personality development, structure, and func- 
tion, 

34. Individual Psychotherapy: How effectively will he: Conduct various types of individual psycho- 
therapy. 

35. Group Psychotherapy: How effectively will he: Conduct various types of group psychotherapy. 

36. Research: How well will he: 

Recognize and define important research problems in clinical psychology; critically evaluate and 
apply the research findings of others; think with originality and scientific rigor; employ appropriate 
experimental design and statistical methods; grasp practical implications of findings; present results 
and conclusions in clear, comprehensive, and well-organized form. 

37. Administration: How well will he: 

Plan and develop psychological programs; make proper administrative decisions; delegate responsi- 
bility appropriately; elicit cooperation from subordinates and superiors; maintain high morale among 
his staff; carry out or direct an appropriate public relations program. 

38. Supervising Clinical Psychologists: How well will he: 

Carry out the professional supervision of subordinates assigned to him for duty and on-the-job 
instruction; assign their duties; evaluate their performance; instruct them in clinical techniques; 
perform other aspects of in-service training. 

39. Teaching Psychology (in a College or University): How well will he: 

Teach college courses in general psychology; motivate students; present concepts and procedures; 
stimulate critical thinking about and integration of course materials; evaluate the products of learn- 
ing. 

40. Professional Interpersonal Relations: How well will he: 

Work cooperatively with superiors, subordinates, members of the mental team, and other pro- 
fessional personnel concerned with the patient’s welfare; participate in the give-and-take of staff 
conferences; contribute to group decisions. 

41. Integrity of Personal and Professional Behavior: How well will he: 

Recognize and fulfill professional responsibilities; live up to personal commitments; show loyalty 
to professional obligations in the event of outside pressure or promise of personal gain; maintain 
discretion- concerning professional matters; appropriately conform with commonly accepted standards 
of moral and social behavior; refrain from coloring facts, evasion, lying, etc. 

42. Overall Suitability for Clinical Psychology: In view of his assets and liabilities, how well will 
he be able to: 

Carry out the several duties—diagnosis, therapy, and research—specified for the position of clinical 
psychologist (P-4 and above) in the Veterans Administration. 
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